02 - Facies Classification using TPOT

George Crowther - https://www.linkedin.com/in/george-crowther-9669a931?trk=hp-identity-name

In this second attempt, I've updated some of the feature engineering before re-training an extra trees classifier on the data

1. Data Loading and Initial Observations


In [1]:
# Initial imports for reading data and first observations
import pandas as pd
import bokeh.plotting as bk
import numpy as np

from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix

from tpot import TPOTClassifier

import sys
sys.path.append(r'C:\Users\george.crowther\Documents\Python\Projects\2016-ml-contest-master')
from classification_utilities import display_cm, display_adj_cm

bk.output_notebook()


Loading BokehJS ...

In [2]:
# Input file paths
train_path = r'..\training_data.csv'
test_path = r'.\validation_data_nofacies.csv'

# Read training data to dataframe
train = pd.read_csv(train_path)

# TPOT library requires that the target class is renamed to 'class'
train.rename(columns={'Facies': 'class'}, inplace=True)

In [6]:
train.head()


Out[6]:
class Formation Well Name Depth GR ILD_log10 DeltaPHI PHIND PE NM_M RELPOS
0 3 A1 SH SHRIMPLIN 2793.0 77.45 0.664 9.9 11.915 4.6 1 1.000
1 3 A1 SH SHRIMPLIN 2793.5 78.26 0.661 14.2 12.565 4.1 1 0.979
2 3 A1 SH SHRIMPLIN 2794.0 79.05 0.658 14.8 13.050 3.6 1 0.957
3 3 A1 SH SHRIMPLIN 2794.5 86.10 0.655 13.9 13.115 3.5 1 0.936
4 3 A1 SH SHRIMPLIN 2795.0 74.58 0.647 13.5 13.300 3.4 1 0.915

In [7]:
train.describe()


Out[7]:
class Depth GR ILD_log10 DeltaPHI PHIND PE NM_M RELPOS
count 3232.000000 3232.000000 3232.000000 3232.000000 3232.000000 3232.000000 3232.000000 3232.000000 3232.000000
mean 4.422030 2875.824567 66.135769 0.642719 3.559642 13.483213 3.725014 1.498453 0.520287
std 2.504243 131.006274 30.854826 0.241845 5.228948 7.698980 0.896152 0.500075 0.286792
min 1.000000 2573.500000 13.250000 -0.025949 -21.832000 0.550000 0.200000 1.000000 0.010000
25% 2.000000 2791.000000 46.918750 0.492750 1.163750 8.346750 3.100000 1.000000 0.273000
50% 4.000000 2893.500000 65.721500 0.624437 3.500000 12.150000 3.551500 1.000000 0.526000
75% 6.000000 2980.000000 79.626250 0.812735 6.432500 16.453750 4.300000 2.000000 0.767250
max 9.000000 3122.500000 361.150000 1.480000 18.600000 84.400000 8.094000 2.000000 1.000000

Feature Engineering and Creation

Again, as with the previous result, the method here is somewhat brute-force, looking at the differences between each sample and it's formation mean/median, its above formation lower sample and below formation upper sample. There could definitely be more metrics, and undoubtedly more informed metrics to pull in this manner, these are arguably somewhat naieve.


In [24]:
def feature_extraction(train):
    #------------------------------------
    # Split and separate formation names into 
    for i, value in enumerate(train.Formation.unique()):
        name_a = value.split(' ')[0]
        name_b = value.split(' ')[1]
        if name_a not in train.columns:
            train[name_a] = 0
        if name_b not in train.columns:
            train[name_b] = 0

        train.loc[train.Formation == value, name_a] = 1
        train.loc[train.Formation == value, name_b] = 1
    #------------------------------------
    # Replace formation names with values
    for i, value in enumerate(train['Formation'].unique()):
        train.loc[train['Formation'] == value, 'Formation'] = i

    #------------------------------------
    # Going to take the difference of each sample from the formation mean and median for each well for each measured parameter
    # This will add a 0 value column for each potential value
    columns = ['Formation', 'Depth', 'GR', 'ILD_log10', 'DeltaPHI', 'PHIND', 'PE', 'NM_M', 'RELPOS']

    above_columns = ['above_delta_' + col for col in columns]
    below_columns = ['below_delta_' + col for col in columns]
    formation_columns = ['formation_delta_' + col for col in columns]
    formation_med_columns = ['formation_delta_med_' + col for col in columns]

    def add_empty_columns(df, column_list):
        for column in column_list:
            df[column] = 0

    for column_list in [above_columns, below_columns, formation_columns, formation_med_columns]:
        add_empty_columns(train, column_list)

    #-------------------------------------------
    # Group data by well, sort by depth, then groupby formation
    # Take mean, median, top and bottom (by depth) values for each sub group
    # Add feature which is the difference of the sample from the mean for each formation and its adjacent formations
    # TBD - un-log 'ILD log10' prior to mean, the re-log
    for i, group in train.groupby('Well Name'):
        iteration = 0
        sorted_group = group.sort_values('Depth')
        for j, sub_group in sorted_group.groupby('Formation'):

            means = sub_group[columns].mean()
            medians = sub_group[columns].median()
            top = sub_group.iloc[0][columns]

            if iteration == 0:
                above_group = sub_group
            else:
                above_means = above_group[columns].mean()
                above_bottom = above_group.iloc[-1][columns]
                train.loc[sub_group.index, above_columns] = (train.loc[sub_group.index, columns] - above_bottom).values
                train.loc[above_group.index, below_columns] = (train.loc[sub_group.index, columns] - top).values

            train.loc[sub_group.index, formation_columns] = (train.loc[sub_group.index, columns] - means).values
            train.loc[sub_group.index, formation_med_columns] = (train.loc[sub_group.index, columns] - medians).values

            above_group = sub_group
            iteration += 1
    
    return train

In [15]:
facies_labels = ['SS', 'CSiS', 'FSiS', 'SiSh', 'MS',
                 'WS', 'D','PS', 'BS']
model_columns = train.columns[11:]

4. TPOT

TPOT uses a genetic algorithm to tune model parameters for the most effective fit. This can take quite a while to process if you want to re-run this part!


In [18]:
# Input file paths
train_path = r'..\training_data.csv'

# Read training data to dataframe
train = pd.read_csv(train_path)

# TPOT library requires that the target class is renamed to 'class'
train.rename(columns={'Facies': 'class'}, inplace=True)

train = feature_extraction(train)

In [8]:
alt_model_columns = ['GR', 'ILD_log10',
       'DeltaPHI', 'PHIND', 'PE', 'NM_M', 'RELPOS', 'A1', 'SH', 'LM', 'B1',
       'B2', 'B3', 'B4', 'B5', 'C', 'above_delta_Formation',
       'above_delta_Depth', 'above_delta_GR', 'above_delta_ILD_log10',
       'above_delta_DeltaPHI', 'above_delta_PHIND', 'above_delta_PE',
       'above_delta_NM_M', 'above_delta_RELPOS', 'below_delta_Formation',
       'below_delta_Depth', 'below_delta_GR', 'below_delta_ILD_log10',
       'below_delta_DeltaPHI', 'below_delta_PHIND', 'below_delta_PE',
       'below_delta_NM_M', 'below_delta_RELPOS', 'formation_delta_Formation',
       'formation_delta_Depth', 'formation_delta_GR',
       'formation_delta_ILD_log10', 'formation_delta_DeltaPHI',
       'formation_delta_PHIND', 'formation_delta_PE', 'formation_delta_NM_M',
       'formation_delta_RELPOS', 'formation_delta_med_Formation',
       'formation_delta_med_Depth', 'formation_delta_med_GR',
       'formation_delta_med_ILD_log10', 'formation_delta_med_DeltaPHI',
       'formation_delta_med_PHIND', 'formation_delta_med_PE',
       'formation_delta_med_NM_M', 'formation_delta_med_RELPOS']

In [9]:
#-------------------------------
# Z-scale normalisation of features. 
# Should probably exclude boolean features from normalisation, though should make nominal difference.
std_scaler = preprocessing.StandardScaler().fit(train[alt_model_columns])
norm = std_scaler.transform(train[alt_model_columns])

norm_frame = train
for i, column in enumerate(alt_model_columns):
    norm_frame.loc[:, column] = norm[:, i]

train = norm_frame

In [155]:
train[alt_model_columns].describe()


Out[155]:
GR ILD_log10 DeltaPHI PHIND PE NM_M RELPOS A1 SH LM ... formation_delta_RELPOS formation_delta_med_Formation formation_delta_med_Depth formation_delta_med_GR formation_delta_med_ILD_log10 formation_delta_med_DeltaPHI formation_delta_med_PHIND formation_delta_med_PE formation_delta_med_NM_M formation_delta_med_RELPOS
count 3.232000e+03 3.232000e+03 3.232000e+03 3.232000e+03 3.232000e+03 3.232000e+03 3.232000e+03 3.232000e+03 3.232000e+03 3.232000e+03 ... 3.232000e+03 3232.0 3.232000e+03 3.232000e+03 3.232000e+03 3.232000e+03 3.232000e+03 3.232000e+03 3.232000e+03 3.232000e+03
mean 3.209754e-16 4.484861e-16 -3.517538e-17 -1.363046e-16 5.100431e-16 -3.517538e-17 1.429000e-16 9.233538e-17 -1.934646e-16 -7.035077e-17 ... -4.232038e-17 0.0 1.319077e-17 3.077846e-17 -8.793846e-18 -5.496154e-19 -1.758769e-17 5.496154e-18 3.077846e-17 2.967923e-17
std 1.000155e+00 1.000155e+00 1.000155e+00 1.000155e+00 1.000155e+00 1.000155e+00 1.000155e+00 1.000155e+00 1.000155e+00 1.000155e+00 ... 1.000155e+00 0.0 1.000155e+00 1.000155e+00 1.000155e+00 1.000155e+00 1.000155e+00 1.000155e+00 1.000155e+00 1.000155e+00
min -1.714285e+00 -2.765296e+00 -4.856726e+00 -1.680121e+00 -3.934108e+00 -9.969107e-01 -1.779569e+00 -5.625806e-01 -9.956777e-01 -1.004341e+00 ... -2.494438e+00 0.0 -1.138142e+01 -2.715730e+00 -4.029226e+00 -6.260693e+00 -4.250182e+00 -5.731779e+00 -8.744320e+00 -2.549279e+00
25% -6.229169e-01 -6.202015e-01 -4.582685e-01 -6.672647e-01 -6.975496e-01 -9.969107e-01 -8.623869e-01 -5.625806e-01 -9.956777e-01 -1.004341e+00 ... -8.358484e-01 0.0 -5.487312e-01 -4.287593e-01 -4.695474e-01 -3.941843e-01 -4.363377e-01 -4.660069e-01 3.258753e-02 -8.335645e-01
50% -1.342848e-02 -7.560721e-02 -1.140783e-02 -1.731943e-01 -1.936510e-01 -9.969107e-01 1.992191e-02 -5.625806e-01 -9.956777e-01 9.956777e-01 ... -2.351765e-03 0.0 1.056513e-02 -1.604354e-01 3.090394e-03 1.516462e-02 -1.458224e-01 5.452730e-02 3.258753e-02 -4.184040e-03
75% 4.372920e-01 7.031033e-01 5.494992e-01 3.858948e-01 6.417158e-01 1.003099e+00 8.612539e-01 -5.625806e-01 1.004341e+00 9.956777e-01 ... 8.400663e-01 0.0 5.698615e-01 1.757048e-01 5.204175e-01 4.910436e-01 2.205165e-01 5.211649e-01 3.258753e-02 8.323156e-01
max 9.562844e+00 3.462598e+00 2.876809e+00 9.212618e+00 4.876027e+00 1.003099e+00 1.672943e+00 1.777523e+00 1.004341e+00 9.956777e-01 ... 2.495892e+00 0.0 9.812970e+00 1.110082e+01 3.929107e+00 4.480679e+00 1.127884e+01 5.779237e+00 8.809495e+00 2.879070e+00

8 rows × 52 columns


In [10]:
#------------------------------------
# Train test split
alt_train_f, alt_test_f = train_test_split(train, test_size = 0.1, 
                                   random_state = 68)

In [12]:
# Setup TPOT classifier and train
alt_tpot = TPOTClassifier(verbosity = 2, generations = 5, max_eval_time_mins = 60)
alt_tpot.fit(alt_train_f[alt_model_columns], alt_train_f['class'])


Optimization Progress:  17%|█████████████████████▌                                                                                                           | 100/600 [26:17<7:20:59, 52.92s/pipeline]
Generation 1 - Current best internal CV score: 0.9118500821427303
Optimization Progress:  32%|████████████████████████████████████████▋                                                                                      | 192/600 [1:00:59<2:45:30, 24.34s/pipeline]
Generation 2 - Current best internal CV score: 0.9118500821427303
Optimization Progress:  48%|█████████████████████████████████████████████████████████████▍                                                                 | 290/600 [1:38:36<2:25:11, 28.10s/pipeline]
Generation 3 - Current best internal CV score: 0.91286129457966
Optimization Progress:  65%|██████████████████████████████████████████████████████████████████████████████████▏                                            | 388/600 [2:12:31<1:02:33, 17.71s/pipeline]
Generation 4 - Current best internal CV score: 0.91286129457966
Optimization Progress:  82%|█████████████████████████████████████████████████████████████████████████████████████████████████████████▎                       | 490/600 [2:52:00<41:39, 22.72s/pipeline]
Generation 5 - Current best internal CV score: 0.91286129457966

Best pipeline: ExtraTreesClassifier(input_matrix, 41, 0.47999999999999998)

In [22]:
print(alt_tpot.score(alt_test_f[alt_model_columns], alt_test_f['class']))
alt_tpot.export('02 contest_export.py')


0.911214630264

In [49]:
result = alt_tpot.predict(train[alt_model_columns])

conf = confusion_matrix(train['class'], result)
display_cm(conf, facies_labels, hide_zeros=True, display_metrics = True)

def accuracy(conf):
    total_correct = 0.
    nb_classes = conf.shape[0]
    for i in np.arange(0,nb_classes):
        total_correct += conf[i][i]
    acc = total_correct/sum(sum(conf))
    return acc

print(accuracy(conf))

adjacent_facies = np.array([[1], [0,2], [1], [4], [3,5], [4,6,7], [5,7], [5,6,8], [6,7]])

def accuracy_adjacent(conf, adjacent_facies):
    nb_classes = conf.shape[0]
    total_correct = 0.
    for i in np.arange(0,nb_classes):
        total_correct += conf[i][i]
        for j in adjacent_facies[i]:
            total_correct += conf[i][j]
    return total_correct / sum(sum(conf))

print(accuracy_adjacent(conf, adjacent_facies))


     Pred    SS  CSiS  FSiS  SiSh    MS    WS     D    PS    BS Total
     True
       SS   257     2                                             259
     CSiS     1   730     7                                       738
     FSiS           6   608     1                                 615
     SiSh                     182           2                     184
       MS           1           2   209     3           2         217
       WS                       3     1   455           3         462
        D                       2                96                98
       PS                       1     2     7     1   486     1   498
       BS                                               1   160   161

Precision  1.00  0.99  0.99  0.95  0.99  0.97  0.99  0.99  0.99  0.98
   Recall  0.99  0.99  0.99  0.99  0.96  0.98  0.98  0.98  0.99  0.98
       F1  0.99  0.99  0.99  0.97  0.97  0.98  0.98  0.98  0.99  0.98
0.984839108911
0.995668316832

Workflow for Test Data

All the code below here can be re-run to load the model, fit it and predict on the test dataset.


In [40]:
test_path = r'..\validation_data_nofacies.csv'

# Read training data to dataframe
test = pd.read_csv(test_path)

# Rename 'Facies'
test.rename(columns={'Facies': 'class'}, inplace=True)

frame = feature_extraction(test)

In [41]:
frame.describe()


Out[41]:
Depth GR ILD_log10 DeltaPHI PHIND PE NM_M RELPOS A1 SH ... formation_delta_RELPOS formation_delta_med_Formation formation_delta_med_Depth formation_delta_med_GR formation_delta_med_ILD_log10 formation_delta_med_DeltaPHI formation_delta_med_PHIND formation_delta_med_PE formation_delta_med_NM_M formation_delta_med_RELPOS
count 830.000000 830.00000 830.000000 830.000000 830.000000 830.000000 830.000000 830.000000 830.000000 830.000000 ... 8.300000e+02 830.0 830.000000 830.000000 830.000000 830.000000 830.000000 830.000000 830.000000 830.000000
mean 2987.070482 57.61173 0.666312 2.851964 11.655277 3.654178 1.678313 0.535807 0.266265 0.318072 ... 6.420567e-18 0.0 0.010843 3.602400 -0.006436 0.078524 0.472410 0.000482 -0.003614 0.002496
std 94.391925 27.52774 0.288367 3.442074 5.190236 0.649793 0.467405 0.283062 0.442271 0.466009 ... 2.765885e-01 0.0 8.945698 24.054012 0.209007 2.378272 4.271225 0.471993 0.104132 0.276883
min 2808.000000 12.03600 -0.468000 -8.900000 1.855000 2.113000 1.000000 0.013000 0.000000 0.000000 ... -4.959667e-01 0.0 -25.000000 -40.300000 -0.718000 -12.700000 -14.395000 -2.433000 -1.000000 -0.506000
25% 2911.625000 36.77325 0.541000 0.411250 7.700000 3.171500 1.000000 0.300000 0.000000 0.000000 ... -2.322438e-01 0.0 -4.500000 -8.233500 -0.098750 -1.237500 -1.940000 -0.232125 0.000000 -0.230000
50% 2993.750000 58.34450 0.675000 2.397500 10.950000 3.515500 2.000000 0.547500 0.000000 0.000000 ... -1.416603e-04 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
75% 3055.375000 73.05150 0.850750 4.600000 14.793750 4.191500 2.000000 0.778000 1.000000 1.000000 ... 2.336599e-01 0.0 4.500000 10.198000 0.096750 1.292500 2.703750 0.241500 0.000000 0.236375
max 3160.500000 220.41300 1.507000 16.500000 31.335000 6.321000 2.000000 1.000000 1.000000 1.000000 ... 5.172903e-01 0.0 25.000000 176.675000 0.824000 8.050000 27.760000 1.884000 1.000000 0.562000

8 rows × 53 columns


In [42]:
alt_model_columns = ['GR', 'ILD_log10',
       'DeltaPHI', 'PHIND', 'PE', 'NM_M', 'RELPOS', 'A1', 'SH', 'LM', 'B1',
       'B2', 'B3', 'B4', 'B5', 'C', 'above_delta_Formation',
       'above_delta_Depth', 'above_delta_GR', 'above_delta_ILD_log10',
       'above_delta_DeltaPHI', 'above_delta_PHIND', 'above_delta_PE',
       'above_delta_NM_M', 'above_delta_RELPOS', 'below_delta_Formation',
       'below_delta_Depth', 'below_delta_GR', 'below_delta_ILD_log10',
       'below_delta_DeltaPHI', 'below_delta_PHIND', 'below_delta_PE',
       'below_delta_NM_M', 'below_delta_RELPOS', 'formation_delta_Formation',
       'formation_delta_Depth', 'formation_delta_GR',
       'formation_delta_ILD_log10', 'formation_delta_DeltaPHI',
       'formation_delta_PHIND', 'formation_delta_PE', 'formation_delta_NM_M',
       'formation_delta_RELPOS', 'formation_delta_med_Formation',
       'formation_delta_med_Depth', 'formation_delta_med_GR',
       'formation_delta_med_ILD_log10', 'formation_delta_med_DeltaPHI',
       'formation_delta_med_PHIND', 'formation_delta_med_PE',
       'formation_delta_med_NM_M', 'formation_delta_med_RELPOS']

std_scaler = preprocessing.StandardScaler().fit(frame[alt_model_columns])
norm = std_scaler.transform(frame[alt_model_columns])

norm_frame = frame
for i, column in enumerate(alt_model_columns):
    norm_frame.loc[:, column] = norm[:, i]

frame = norm_frame
frame.describe()


Out[42]:
Depth GR ILD_log10 DeltaPHI PHIND PE NM_M RELPOS A1 SH ... formation_delta_RELPOS formation_delta_med_Formation formation_delta_med_Depth formation_delta_med_GR formation_delta_med_ILD_log10 formation_delta_med_DeltaPHI formation_delta_med_PHIND formation_delta_med_PE formation_delta_med_NM_M formation_delta_med_RELPOS
count 830.000000 8.300000e+02 8.300000e+02 8.300000e+02 8.300000e+02 8.300000e+02 8.300000e+02 8.300000e+02 8.300000e+02 8.300000e+02 ... 8.300000e+02 830.0 8.300000e+02 8.300000e+02 8.300000e+02 8.300000e+02 8.300000e+02 8.300000e+02 8.300000e+02 8.300000e+02
mean 2987.070482 -7.704680e-17 -1.198506e-16 -3.424302e-17 -3.210283e-16 2.739442e-16 1.369721e-16 -1.112898e-16 -1.027291e-16 5.136454e-17 ... 8.560756e-18 0.0 -3.424302e-17 1.712151e-17 -1.070094e-17 2.140189e-17 2.140189e-17 7.490661e-18 8.774775e-17 -3.424302e-17
std 94.391925 1.000603e+00 1.000603e+00 1.000603e+00 1.000603e+00 1.000603e+00 1.000603e+00 1.000603e+00 1.000603e+00 1.000603e+00 ... 1.000603e+00 0.0 1.000603e+00 1.000603e+00 1.000603e+00 1.000603e+00 1.000603e+00 1.000603e+00 1.000603e+00 1.000603e+00
min 2808.000000 -1.656627e+00 -3.935939e+00 -3.416269e+00 -1.889353e+00 -2.373228e+00 -1.452107e+00 -1.848086e+00 -6.024035e-01 -6.829576e-01 ... -1.794238e+00 0.0 -2.797537e+00 -1.826260e+00 -3.406552e+00 -5.376268e+00 -3.482929e+00 -5.158872e+00 -9.574299e+00 -1.837613e+00
25% 2911.625000 -7.574558e-01 -4.348191e-01 -7.095099e-01 -7.625206e-01 -7.432663e-01 -1.452107e+00 -8.335616e-01 -6.024035e-01 -6.829576e-01 ... -8.401789e-01 0.0 -5.045513e-01 -4.923518e-01 -4.419448e-01 -5.536867e-01 -5.651456e-01 -4.931162e-01 3.473144e-02 -8.401996e-01
50% 2993.750000 2.663538e-02 3.014625e-02 -1.321116e-01 -1.359673e-01 -2.135479e-01 6.886546e-01 4.133311e-02 -6.024035e-01 -6.829576e-01 ... -5.124788e-04 0.0 -1.212864e-03 -1.498533e-01 3.081250e-02 -3.303719e-02 -1.106695e-01 -1.021665e-03 3.473144e-02 -9.021482e-03
75% 3055.375000 5.612186e-01 6.399796e-01 5.081501e-01 6.050525e-01 8.274105e-01 6.886546e-01 8.561341e-01 1.660017e+00 1.464220e+00 ... 8.453020e-01 0.0 5.021255e-01 2.743649e-01 4.939950e-01 5.107522e-01 5.227272e-01 5.109474e-01 3.473144e-02 8.451947e-01
max 3160.500000 5.917647e+00 2.917095e+00 3.967453e+00 3.793968e+00 4.106583e+00 6.886546e-01 1.640888e+00 1.660017e+00 1.464220e+00 ... 1.871380e+00 0.0 2.795112e+00 7.199504e+00 3.975643e+00 3.353814e+00 6.392555e+00 3.992973e+00 9.643762e+00 2.021944e+00

8 rows × 53 columns


In [43]:
#--------------------------------------
# TPOT Exported Model

from sklearn.ensemble import ExtraTreesClassifier, VotingClassifier
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline, make_union
from sklearn.preprocessing import FunctionTransformer

exported_pipeline = make_pipeline(
    ExtraTreesClassifier(criterion="entropy", max_features=0.48, n_estimators=500)
)

exported_pipeline.fit(train[alt_model_columns], train['class'])


Out[43]:
Pipeline(steps=[('extratreesclassifier', ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion='entropy',
           max_depth=None, max_features=0.48, max_leaf_nodes=None,
           min_impurity_split=1e-07, min_samples_leaf=1,
           min_samples_split=2, min_weight_fraction_leaf=0.0,
           n_estimators=500, n_jobs=1, oob_score=False, random_state=None,
           verbose=0, warm_start=False))])

In [44]:
frame['Facies'] = exported_pipeline.predict(frame[alt_model_columns])

In [52]:
frame['Facies']


Out[52]:
0      3
1      3
2      3
3      3
4      3
5      3
6      3
7      3
8      3
9      2
10     2
11     2
12     2
13     2
14     2
15     2
16     2
17     2
18     2
19     2
20     2
21     2
22     2
23     2
24     2
25     2
26     2
27     2
28     2
29     2
      ..
800    7
801    9
802    7
803    8
804    6
805    6
806    8
807    8
808    6
809    6
810    8
811    8
812    3
813    3
814    3
815    3
816    3
817    3
818    3
819    3
820    3
821    3
822    3
823    3
824    3
825    3
826    3
827    3
828    3
829    3
Name: Facies, dtype: int64

In [46]:
frame.to_csv('02 - Well Facies Prediction - Test Data Set.csv')